Response to Mease and Wyner , Evidence Contrary to the Statistical View of Boosting , JMLR 9 : 131 – 156 , 2008

نویسندگان

  • Andreas Buja
  • Yoav Freund
چکیده

We thank the authors for writing a thought-provoking piece that may ruffle the feathers of recent orthodoxies in boosting. We also thank JMLR for publishing this article! Since the late 1990s, boosting has undergone the equivalent of a simultaneous X-ray, fMRI and PET exam, and the common view these days is that boosting is a kind of model fitting. As such, it is subjected to assumptions that are common in non-parametric statistics, such as: limiting the complexity of the base learner, building up complexity gradually by optimization, and preventing overfitting by early stopping or by regularizing the criterion with a complexity penalty. The theories backing this up use VC dimensions and other measures to show that, if the complexity of fits grows sufficiently slowly, asymptotic guarantees can be given. Into this orthodox scene Mease and Wyner throw one of the most original mind bogglers we have seen in a long time: “if stumps are causing overfitting, be willing to try larger trees.” In other words, if boosting a low-complexity base learner leads to overfit, try a higher-complexity base learner; boosting it might just not overfit. Empirical evidence backs up the claim. Is this counterintuitive wisdom so surprising? Yes, if seen from the point of view of orthodoxy, but less so when reviving some older memories. We may remind ourselves how boosting’s fame arose in statistics when the late Leo Breiman stated in a discussed 1998 Annals of Statistics article (based on a 1996 report) that boosting algorithms are “the most accurate ... off-the-shelf classifiers on a wide variety of data sets.” We should further remind ourselves what this praise was based on: boosting of the full CART algorithm by Breiman himself, and boosting of the full C4.5 algorithm by others. In other words, the base learners were anything but ‘weak’ in the sense of today’s orthodoxy, where ’weak’ means ‘low complexity, low variance, and generally high bias.’ (Few people today use PAC theory’s untenable notion of weak learner, which was gently demolished by Breiman in the appendix of this same article.) Breiman’s (1998b, p. 802) 02) major conclusion at the time was: “The main effect of both bagging and [boosting] is to reduce variance.” It appears, therefore, that his notion of ‘weak learner’ was one of ‘high complexity, high variance, and low bias’! This was before the low-variance orthodoxy set in and erased the memories of the early boosting experiences. Unfortunately, soon thereafter Breiman saw his own assumptions thrown into question when he learned from Schapire et al.’s (1998) work that excellent results could also be achieved by boosting

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evidence Contrary to the Statistical View of Boosting

The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contex...

متن کامل

Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology th...

متن کامل

Evaluation of response to I-131 ablative therapy in patients with differentiated thyroid carcinoma: A five year retrospective study [Persian]

Differentiated thyroid carcinomas usually show good uptake and response to I-131 ablative treatment. In this study, 90 patients following near total thyroidectomy who were referred to our institute for I-131 therapy are retrospectively studied. The patients are divided in two groups. Group 1 revealed complete ablation after one dose of I-131. Group 2 needed more than one dose of I-131 for...

متن کامل

Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less expl...

متن کامل

I-31: The Scientific Underpinning of ART in Unexplained Infertility

Although intra uterine insemination (IUI) and in vitro fertilization (IVF) are widely accepted treatments among doctors and patients and practiced on large scale, it is good to realize that they have rarely been evaluated properly in randomized clinical trials or even in comparative cohort studies. Although the first pregnancy after IUI was established in 1884, it was not until 2008 that the fi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008